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ABSTRACT 

The  problem  of  partitioning  a  time-series  into  segments  is  considered. 
The  segments  fall  into  classes,  which  may  correspond  to  phases  of  a  cycle 
precession,  recovery,  expansion  in  the  business  cycle)  or  to  portions  of  a 
signal  obtained  by  scanning  (background/  clutter,  target,  background/clutter 
again,  another  target,  etc.),  or  normal  tissue,  tumor,  normal  tissue  in 
medical  applications.  A  probability  distribution  is  associated  with  each 
class  of  segment.  Parametric  families  of  distributions  are  considered,  a  set 
of  parameter  values  being  associated  with  each  class.  With  each  observation 
is  associated  an  unobservable  label,  indicating  from  which  class  the 
observation  arose.  The  label  process  is  modeled  as  a  Markov  chain. 
Segmentation  algorithms  are  obtained  by  applying  a  method  of  iterated 
maximum  likelihood  to  the  resulting  likelihood  function.  In  this  paper 
special  attention  is  given  to  the  situation  in  which  the  observations  are 
conditionally  independent,  given  the  labels.  A  numerical  example  is  given. 
Choice  of  the  number  of  classes,  using  Akaike's  information  criterion  (AIC) 
for  model  identification,  is  illustrated.^ 

Similar  ideas  are  applied  to  the  problem  of  segmenting  digital  images, 
where  possible  applications  include  SEASAT  (and  LANDSAT)  multi-spectral 
images.  A  numerical  example  is  given. 
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1.  Introduction 

Problems  of  segmenting  time  series  or  digital  images  are  considered. 
In  both  cases  the  observations  fall  into  classes,  and  this  assignment 
of  observations  to  classes,  or  "labeling,"  is  unknown.  Thus  each  point 
gives  rise  to  a  pair,  the  observation  itself,  together  with  its  unknown 
label.  In  the  context  of  this  model,  segmentation  is  estimation  of  the 
labels.  In  time  series  the  points  are  time  points;  in  digital  images, 
the  points  are  points  of  the  image  (picture  elements,  or  pixels).  An 
image  is  two-dimensional,  while  time  is  one -dimens iona 1 ,  so  time  series 
are  treated  first. 

2.  Segmentation  of  Time  Series 

The  problem  of  segmentation  considered  here  is:  Given  a  time  series 

{xc,  t*l,2 . n), 

partition  the  set  of  values  of  t  into  sub-series  (segments,  regimes)  for 
which  the  values  x^  are  relatively  homogeneous.  The  segments  are 

assumed  to  fall  into  several  classes.  In  cyclic  processes  the  classes  are 
phases  of  the  cycle. 

Examples .  (i)  Segment  a  received  signal  into  background,  target, 

background  again,  another  target,  etc.  (ii)  Segment  an  EEG  of  a  sleeping 
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person  into  periods  of  deep  sleep  and  restless  or  fitful  sleep  (two  classes  of 
segment),  (iii)  Segment  an  ECG  into  rhythmic  and  arhythmic  periods  (two 
classes  of  segment),  (iv)  Segment  an  economic  time  series  into  periods  of 
recession,  recovery,  and  expansion.  Here  there  are  three  classes  of  segment. 

In  some  applications  the  observation  X  is  a  vector  of  several  measure* 
ments.  E.g.,  for  blood  pressure,  X  is  a  vector  of  the  two  measurements, 
systolic  and  diastolic.  The  discussion  of  segmentation  of  digital  images, 
later  in  the  paper,  will  be  in  terms  of  vector  measurements.  Time  series 
will  be  discussed  in  terms  of  scalar  (single)  measurements,  although 
the  ideas  and  methods  readily  generalize  to  multiple  time  series. 

2.1.  The  Model 

One  can  imagine  a  series  which  is  usually  relatively  smooth  but  occa¬ 
sionally  rather  jumpy  as  being  composed  of  sub-series  which  are  first- 
order  autoregressive,  the  autocorrelation  coefficient  being  positive  for 
the  smooth  segments  and  negative  for  the  jumpy  ones.  One  might  try 
fitting  such  data  with  a  segmentation  of  two  classes,  one  corresponding 
to  a  positive  autocorrelation,  the  other,  to  a  negative  autocorrelation. 

The  mechanism  generating  the  process  changes  from  time  to  time,  and 
these  changes  manifest  themselves  at  some  unknown  time  points  (epochs,  or 
change-points ) 

V  '■z*---'  Cm-1’ 

i.e.,  the  number  of  segments  is  m.  The  integer  m  and  the  epochs  are  unknown. 
Generally  there  will  be  fewer  than  m  generating  mechanisms.  The  number  of 
mechanisms  (classes)  will  be  denoted  by  k;  it  will  be  assumed  that  k  is  at 
most  m.  In  some  situations,  k  is  specified;  in  others,  it  is  not.  With  the 
c-th  class  is  associated  a  stochastic  process,  Pc,  say.  E.g.,  above  we 

spoke  of  a  situation  with  k*2  classes,  where,  for  c®l,2,  the  process  P 

0 

is  first-order  autoregressive  with  coefficient,  say,  4  . 

c 
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Now  with  the  t-th  observation  (t=l,2, . . . ,n)  associate  the  label 
7^,  which  is  equal  to  c  if  and  only  if  x^  arose  from  class  c, 

c  *  l,2,...,k.  Each  time-point  t  gives  rise  to  a  pair 

where  x  is  observable  and  7  is  n>t.  The  process  {x  }  is  the 

V  b  W 

observed  time  series;  {7^}  will  be  called  the  label  process. 


Define  a  segmentation,  then,  as  a  partition  of  the  time  index  set 
{t:  t  *  l,2,...,n}  into  subsets 

Sx  =  (1,2,...,^),  S2  *  {t1+lf...,t2>,...,  Sn  =  (t^+1 . n}, 

where  the  t's  are  subscripted  in  ascending  order.  Each  subset  S^,  g  * 


1,2,... ,m,  is  a  segment.  The  integer  m  is  not  specified.  In  the  context  of 
this  model,  to  segment  the  series  is  merely  to  estimate  the  7's. 

The  idea  underlying  the  development  here  is  that  of  trans it ions  between 
classes.  The  labels  7^  will  be  treated  as  random  variables 

fc  with  transition  probabilities 


pt(rt»dirt.1=o  =  Pcd. 

taken  as  stationary,  i.e.,  independent  of  t.  The  k-by-k  matrix  of  transition 
probabilities  will  be  denoted  by  P,  i.e., 

*  IPcdl’ 

If  a  process  is  strictly  cyclic,  like  intake,  compression,  combustion, 
intake,  etc.,  for  a  combustion  engine,  or  recession  to  recovery  to  expansion 
to  recession,  etc.,  in  the  business  cycle,  then  this  condition  can  be  imposed 
by  using  3  transition  probability  matrix  with  zeros  in  the  appropriate  places. 
W’e  shall  consider  a  matrix  like  this  in  Section  2.3.2. 

Segmentation  will  involve  the  simultaneous  estimation  of  several  sets  of 


parameters,  the  distributional  parameters  of  the  within-class  stochastic 
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processes,  the  transition  probabilities,  and  the  labels. 

A  joint  probability  density  function  (p.d.f.)  for  UXt,rt),  t-1,2 . n} 

is  obtained  by  successively  conditioning  each  variable  on  all  the  preceding 

ones.  The  working  assumptions  behind  the  method  of  this  paper  are  as  follows. 

(A.l)  The  labels  are  a  first-order  stationary  Markov  chain,  independent  of 
the  observations;  i.e.,  the  probability  of  being  in  state  d  at  time 
t+l  given  state  c  at  time  t,  is  PC(j>  which  does  not  involve  t  or  the 

values  of  the  observations. 

(A. 2)  The  distribution  of  the  random  variable  X(  depends  only  on  its  own 
label  and  previous  X's,  not  previous  labels. 

Further  details  and  a  mathematical  formulation  corresponding  to  these 

assumptions  are  given  in  Sclove  (1981). 

In  regard  to  (A. 2),  in  the  simplest  case  the  X's  are  (conditionally) 

independent,  given  the  labels.  That  is,  the  disribution  of  Xt  depends 

only  on  its  label,  and  not  previous  X's.  In  the  examples  in  the  present 
paper  this  assumption  is  made.  In  this  case  the  p.d.f. 's  f(x|T  *c) , 

c  =  l,2,.,,k,  are  called  class -conditional  densities.  In  the  parametric 
case  they  take  the  form 

f(xcl*t*c)  =  g(xt;8c),  (1) 

where  8  is  a  parameter  indexing  a  family  of  p.d.f. 's  of  form  given  by  the 
function  g.  E.g.,  in  the  case  of  Gaussian  class-conditional  distributions, 

5c  consists  of  the  mean  and  variance  for  the  c-th  class. 

This  model,  with  transition  probabilities,  has  certain  advantages  over 
a  model  formulated  in  terms  of  the  epochs.  The  epochs  behave  as 
discrete  parameters,  and,  even  if  the  corresponding  generalized  likelihood  ratio 
were  asymptotically  chi-square,  the  number  of  degrees  of  freedom  would  not  be 
clear.  On  the  other  hand,  the  transition  probabilities  vary  in  an  interval 
and  it  is  clear  that  they  constitute  a  set  of  k(k-l)  free  parameters. 


r 
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2.2.  An  algorithm 

2.2.1.  Development  of  the  algorithm 
If  follows  from  the  assumptions  that  the  likelihood  L,  i.e.,  the  joint 
p.d.f,  considered  as  a  function  of  the  parameters,  can  be  written  in  the  form 
L  =  A({pcd},{rt})B({rt>,<Bc}).  (2) 

Hence,  for  fixed  values  of  the  T's  and  B's,  L  is  maximized  with  respect  to 
the  p's  by  maximizing  the  factor  A.  Now  let  n£d  be  the  number  of  transitions 

from  class  c  to  d.  (These  n's  are  functions  of  the  labels.)  The  factor  A 
is  merely  the  point  multinomial  probability  function,  the  parameters 
being  the  n's  and  p's.  It  follows  that  the  maximum  likelihood 
estimates  of  the  p’s,  for  fixed  values  of  the  other  parameters,  are 
given  by 

Pcd  *  ncd/nc  *  (3) 

where 


n  = 
c 


cl 


nc2  + 


+  n 


ck 


Further,  given  the  p's  and  T's,  the  estimates  of  the  distributional 
parameters --the  B's--are  easy  to  obtain.  This  suggests  the  following 
algorithm. 

Step  0.  Set  the  S's  at  initial  trial  values,  suggested,  e.g.,  by 
previous  knowledge  of  the  phenomenon  under  study.  Set  the  p's  at 
initial  trial  values,  e.g.,  1/k.  Set  f(T^)  at  initial  trial 

values,  e.g.,  fCJ^)  *  1/k,  for  *  l,2,...,k. 

Step  1.  Estimate  by  maximizing  f (T^)f (x^ | T^) . 

Step  2 .  For  t*2,3,...,n,  estimate  by  maximizing 


lTt 


er 


c*tlTt'*t-r 


as,  under  (A.l)  and  (A. 2),  the  likelihood  can  be  expressed  as  a  product 
of  such  factors . 


Step  2.  Now.  having  labeled  the  observations,  estimate  the  distributional 
parameters,  and  estimate  the  transition  probabilities  according  to  (3). 
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Steo  4.  If  no  observation  has  changed  labels  from  the  previous  iteration, 
stop.  Otherwise,  repeat  the  procedure  from  Step  1. 

Step  2  is  Bayesian  classification  of  x^.  Suppose  the  (t-l)-st 

observation  was  tentatively  classified  into  class  c.  Then  the  prior 
probability  that  the  t-th  observation  belongs  to  class  d  is  pcd,  d  = 

l,2,...,k.  Hence  all  the  techniques  for  classification  in  particular  models 
are  available  (e.g.,  use  of  linear  discriminant  functions  when  the 
observations  are  multivariate  normal  with  common  covariance  matrix). 

2.2.2.  The  first  iteration 

When  the  k  class-conditional  processes  consist  of  independent,  identically 
distributed  normally  distributed  random  variables  with  common  variance,  one 
can  start  by  choosing  initial  means  and  labelling  the  observations  by  a  minimum- 
distance  clustering  procedure.  [This  is  one  iteration  of  ISODATA  (Ball  and  Hall, 
1967).  One  could  iterate  further  at  this  stage.]  From  this  clustering 
inipial  estimates  of  transition  probabilities  and  the  variance  are  obtained. 

2.2.3.  Restrictions  on  the  transitions 

As  mentioned  above,  one  might  wish  to  place  restrictions  on  the 
transitions,  e.g.,  to  allow  transitions  only  to  adjacent  states.  (E.g., 
"recovery"  is  adjacent  to  "recession",  "expansion"  is  adjacent  to  "recovery," 
but  "expansion"  is  not  adjacent  to  "recession. ")  The  model  does  permit 
restrictions  on  the  transitions.  The  maximization  is  conducted,  subject  to 
the  condition  that  the  corresponding  transition  probabilities  are  zero.  This 
is  easily  implemented  in  the  algorithm.  Once  one  sets  a  given  transition 
probability  at  zero,  the  algorithm  will  fit  no  such  transitions,  and  the 
corresponding  transition  probability  will  remain  set  at  zero  at  every 


iteration. 
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2.3.  An  example 

Here,  in  the  context  of  a  specific  numerical  example,  the  problem 
of  fitting  the  model  for  a  fixed  k  and  the  problem  of  choice  of  k 
will  be  illustrated.  (The  additional  problem  of  prediction  is  considered 
in  Sclove  1981 . ) 

Quarterly  gross  national  product  (GNP)  in  current  (non -const ant)  dollars 
for  the  twenty  years  1947  to  1966  was  considered.  (This  makes  a  good  size 
dataset  for  expository  purposes  here.)  Parameters  were  estimated  from  the 
first  19  years,  the  last  four  observations  (1966)  being  saved  to  test 
the  accuracy  of  predictions.  (See  Sclove  1981.)  The  data  and  first 
difference  are  given  in  Table  1.  The  raw  series  is  nonstationary,  so  the 
first  differences  (increases  in  quarterly  GNP)  were  analyzed.  (Study 
of  more  recent  data  suggested  use  of  first  differences  of  logs;  this 
will  be  discussed  in  another  context  in  a  later  report.)  The  notation 
is 

Xt  *  GNPt+l  '  GNPt  ’  1  *  l'2 . 79; 

e.g.,  GNP^  is  the  GNP  at  the  end  of  the  quarter  1947-1,  GNP2  is  that  at  the 

end  of  1947-2,  and  x^  =  GNP2  -  GNP^  is  the  increase  in  GNP  during  the  second 

quarter  of  1947.  (A  negative  value  of  an  x  indicates  a  decrease  in  GNP 
for  the  corresponding  quarter.)  A  Gaussian  model  was  used. 

2.3.1.  Fitting  the  model 

In  this  section  we  discuss  the  fitting  of  a  model  with  k=3  classes, 
discussion  of  the  choice  of  k  being  deferred  to  the  next  section. 

The  three  classes  may  be  considered  as  corresponding  to  recession, 
recovery,  and  expansion,  although  some  may  prefer  to  think  of  the  segments 
labeled  as  recovery  as  level  periods  corresponding  to  peaks  and  troughs. 
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The  approximate  maximum  likelihood  solution  found  by  the  iterative  procedure 
was  (units  are  billions  of  current  (non-constant)  dollars)  -1.3,  6.2,  and  12.3 
for  the  means,  2.28  for  the  standard  deviation,  and 

.625  .250  .125 

.156  .625  .219 

.039  .269  .692 

for  the  transition  probabilities.  The  estimated  labels  are  given  below; 

labels  (r  -  recession,  e  =  expansion)  resulting  from  fitting  k=2  classes  (see  / 

Section  2.3.2)  are  also  given. 

t:  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23 

label,  k=3:  22322211111333332222123 

label,  k=2 :  rreeeerrrrreeeeeeeerree 

24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50 
221111223222222222211233331 
errrrrreeeeerrreererrreeeer 
51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75 
2321113333322223333323333 
eerrrreeeeeeeeeeeeeeeeeeeee 
The  process  was  in  state  1  for  21%  of  the  time,  in  state  2  for  44%  of  the  time, 
and  in  state  3  for  35%  of  the  time. 

An  interesting  feature  of  the  model  and  the  algorithm  is  that,  as  the 
iterations  proceed,  some  iso’ated  labels  change  to  conform  to  their  neighbors. 

This  should  be  the  case  when  pcc  is  large  relative  to  PC(^»  for  d  different  from  c. 

2.3.2.  Choice  of  number  of  classes 

Various  values  of  k  were  tried,  the  results  being  scored  by  means  of 
Akaike's  information  criterion  (AIC).  (See,  e.g.,  Akaike  1981.) 

To  estimate  k  one  uses  the  minimum  AIC  estimate,  where 
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AIC(k)  =  -21oga[max  L(k)]  +  2m(k) . 

Here  L(k)  is  the  likelihood  when  k  classes  are  used,  max  denotes  its  maximum 
over  the  parameters,  and  m(k)  is  the  number  of  independent  parameters  when  k 
classes  are  used.  The  statistic  AIC(k)  is  a  natural  estimate  of  the 
"cross -entropy"  between  f  and  g(k),  where  f  is  the  (unknown)  true  density  and 
g(k)  is  the  density  corresponding  to  the  model  with  k  classes.  (See,  e.g., 
the  paper  by  Parzen  in  the  proceedings  of  this  Workshop  for  a  discussion  of 
the  cross -entropy ,  H(f;g).)  According  to  AIC,  inclusion  of  an  additional 
parameter  is  appropriate  if  log^fmax  L]  increases  by  one  unit  or  more, 

i.e.,  if  max  L  increases  by  a  factor  of  e  or  more. 

The  model  was  fit  with  several  values  of  k  and  unrestricted  transition 
probabilities.  Also,  since  it  seems  reasonable  to  restrict  the  transitions 
to  those  between  adjacent  states,  such  models  were  evaluated  as  well.  In  the 
case  of  k=3,  where  the  states  might  be  considered  as  recession,  recovery, 
and  expansion,  this  means  setting  equal  to  zero  the  transition  probabilities 
coresponding  to  the  transitions,  recession-to-expansion  and  expansion-to- 
recession.  The  results  are  given  In  Table  2.  The  best  segmentation  model, 
as  indicated  by  minimum  AIC,  is  that  with  only  two  classes. 

The  results  for  k=2  classes  (which  might  be  called  recession,  expansion) 
were  0.43  and  10.09  for  the  means,  3.306  for  the  standard  deviation,  and 


.667 

.170 


.333 

.830 


for  the  transition  probabilities.  The  process  was  in  state  1  for  37 
of  the  time  and  state  2  the  other  63%  of  the  time.  The  labels  were 


given  above. 

A  model  with  only  two  classes  enjoys  advantages  relating  to  its  relative 
simplicity. 
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3.  A  Model  and  Method  for  Segmentation  of  Digital  Images 


A  digital  (i.e.,  numerical)  image  may  be  considered  as  a  rectangular 


array  of  picture  elements  (pixels).  These  will  be  indexed  by  (i,j).  At 
each  pixel  the  same  p  features  are  observed.  We  denote  the  features  by 

Xl*  X2  *  *  * ' ’  Xp‘ 

The  vector  of  features  is 


The  digital  image  is 


X  =  (xr  x2,  ....  Xp). 


where 


'V  . '■  J*‘.2 . •». 

Sij  -  <Xuy  X2ij . Xpij> 


is  the  vector  of  numerical  values  of  the  p  features  at  pixel  (i,j). 


Examples .  (i)  In  television,  we  have  p  =  3  colors, 

x^„  *  red  level  at  pixel  (i,j), 

=  green  level  at  pixel  (i,j), 


x^^  =  blue  level  at  pixel  (i,j). 

(ii)  In  LANDSAT  data,  p  =  4  spectral  channels,  one  in  the  green/  yellow 
visible  range,  the  second  in  the  red  visible  range,  and  the  other  two 
in  the  near  infrared  range. 


An  object  is  a  set  of  contiguous  pixels  which  may  be  assumed  to 
be  members  of  a  common  class.  One  task  of  image  processing  is 
segmentation,  grouping  of  pixels  with  a  view  toward  identifying 
objects . 

In  this  context  the  conceptual  model  is  that  the  image  is  a  set  of 
pixels,  and,  also,  the  image  consists  of  several  segments.  Each  pixel 
belongs  to  one  and  only  one  segment.  The  segments  fall  into  several 
classes.  For  example,  in  a  picture  of  a  house  the  classes  might  be 
brick,  sky,  grass,  shadow  and  brush.  Note  that  there  might  be  several 
separate  areas  of,  say,  grass.  Each  of  these  areas  is  a  segment, 
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but  they  all  belong  to  the  class,  "grass." 

The  statistical  model  accompanying  this  conceptual  model  is  as  follows: 

--with  each  class  of  segment  is  associated  a  probability 
distribution  for  the  feature  vector  X; 

--with  each  pixel  is  associated  a  label  which,  were  it  known  to  us, 
would  tell  us  which  class  of  segment  the  pixel  belongs  to. 

Each  pixel  thus  gives  rise  to  a  pair  (X,T),  where  X  is  observable  and  7  is 

not.  This  is  the  same  as  the  time-series  segmentation  model.  In  the  context 

of  this  statistical  model  segmentation  is  estimation  of  the  set  of  labels. 

Often  one  considers  parametric  models,  in  which  the  class -conditional 

probability  functions  fc(.)  are  assumed  known,  except  for  the  values 

of  distributional  parameters.  That  is, 

fc(-)  -  «(.;ec). 

where  Bc  is  the  parameter.  E.g. ,  in  the  multivariate  Gaussian  case 

&c  consists  of  the  mean  and  covariance  matrix  for  class  c. 

In  order  to  model  the  spatial  correlation  of  images,  one  can  assume 
that  the  labels  form  a  stochastic  process,  say  a  Markov  process. 

One  reads  through  the  array  in  a  fixed  order,  say,  first  row,  left  to 
right,  second  row,  left  to  right,  and  conditions  a  given  pixel  on 
pixels  preceding  it  in  this  ordering.  Here  a  first-order  Markov 
process  would  be  one  where  a  given  pixel  is  conditioned  on  the  pixels 
to  the  north  and  west  of  it.  Thus  the  transition  probability  matrix 
has  the  following  form  for  k=3  classes  of  segment. 
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Pixel  to  Center  pixel 


north 

west 

1 

2 

3 

1 

1 

PH,1 

P11 , 2 

P11 ,3 

1 

2 

P12,l 

P12,2 

P12,3 

1 

3 

P13,l 

P13 ,2 

P13 ,3 

2 

1 

P21 , 1 

P21,2 

P21,3 

2 

2 

P22, 1 

P22 ,2 

P22 ,3 

2 

3 

P23 , 1 

P23 ,2 

P23 ,3 

3 

1 

P31 , 1 

P31,2 

P31,3 

3 

2 

P32,l 

P32,2 

P32 ,3 

3 

3 

P33, 1 

P33 ,2 

P33 ,3 

The 

total 

number  of 

parameters , 

distributional  parameters  and 

transition  probabilities, 

is  large. 

But,  as  mentioned  by  Jerome 

Friedman 

in  this  Workshop 

,  with  very 

large  datasets  one  ought  not 

necessarily  shy  away  from  using  models  with  many  parameters. 

The  algorithm  developed  for  segmenting  images  according  to  this 
model  is  similar  to  that  for  segmenting  time  series,  except  now  the 
transition-probability  matrix  is  more  complicated. 

As  a  sample  "image"  the  Fisher  iris  data  were  used.  This  dataset 
consists  of  4  features  measured  on  ISO  flowers,  50  in  each  of  three  species. 
To  form  an  image  the  150  flowers  were  arranged  into  a  15  x  10  rectangular 
array,  rows  1-5  being  species  1,  rows  6-10  being  species  2,  rows  11-15  being 
species  3.  This  means  that  the  true  segmentation  is  as  follows. 
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TRUE  SEGMENTATION: 


ROW: 

1  2 
111 
2  1  1 

3  1  1 

4  1  1 

5  1  1 

6  2  2 

7  2  2 

8  2  2 

9  2  2 

10  2  2 

11  3  3 

12  3  3 

13  3  3 

14  3  3 

15  3  3 


COLUMN: 

3  4  5  6 
1111 
1111 
1111 
1111 
1111 
2  2  2  2 
2  2  2  2 
2  2  2  2 
2  2  2  2 
2  2  2  2 
3  3  3  3 
3  3  3  3 
3  3  3  3 
3  3  3  3 
3  3  3  3 


7  8  9  10 
1111 
1111 
1111 
1-111 
1111 
2  2  2  2' 
2  2  2  2 
2  2  2  2 
2  2  2  2 
2  2  2  2 
3  3  3  3 
3  3  3  3 
3  3  3  3 
3  3  3  3 
3  3  3  3 


Below  are  given  results  obtained  by  starting  with  initial  means  equal 
to  the  measurements  on  flowers  50,  100  and  150.  (These  are  easy  for 
the  algorithm  in  the  sense  that  they  are  in  fact  from  the  three 
different  species,  but  not  so  easy  as,  e.g. ,  flowers  1,51  and  101, 
which  are  further  apart.  Starting  with  means  that  are  from  correct 
classes  is  analogous  to  military  applications  where  something  is  known 
about  the  physical  characteristics  of  target,  background,  and  clutter.) 
The  results  in  successive  iterations  were  as  follows.  Convergence  was 
reached  on  the  fourth  iteration,  i.e.,  on  that  iteration  no  pixel 
changed  class. 
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SEGMENTATION  ON  ITERATION  1: 

ROW:  COLUMN:  CONFUSION  MATRIX: 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

True 

Class 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 

3 

t 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

- 

— 

.... 

3 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

50 

0 

0 

1 

50 

4 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

Label  2 

0 

36 

1 

1 

37 

5 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

3 

0 

14 

49 

1 

63 

6 

3 

3 

3 

2 

3 

2 

3 

2 

3 

2 

— 

— 

---- 

7 

2 

2 

2 

3 

2 

2 

2 

2 

2 

2 

50 

50 

50 

i 

150 

8 

3 

2 

3 

2 

2 

2 

3 

3 

2 

2 

9 

2 

2 

2 

3 

2 

3 

3 

2 

2 

2 

10 

2 

3 

2 

2 

2 

2 

2 

2 

2 

2 

11 

3 

3 

3 

3 

3 

3 

2 

3 

3 

3 

12 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

13 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

14 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

15 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

SEGMENTATION  ON  ITERATION  2: 

ROW:  COLUMN:  CONFUSION  MATRIX: 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

True 

Class 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 

3 

1 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

- 

— 

.... 

3 

1 

1 

1 

1 

1 

1 

1 

r 

1 

1 

1 

50 

0 

0 

! 

50 

4 

1 

1 

1 

1 

1 

1 

1 

i 

1 

1 

Label  2 

0 

42 

1 

1 

43 

5 

1 

1 

1 

1 

1 

1 

1 

i 

1 

1 

3 

0 

8 

49 

1 

57 

6 

2 

3 

3 

2 

3 

2 

3 

2 

3 

2 

— 

. 

— 

.... 

7 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

SO 

50 

50 

1 

150 

8 

3 

2 

2 

2 

2 

2 

2 

3 

2 

2 

9 

2 

2 

2 

3 

2 

2 

2 

2 

2 

2 

10 

2 

2 

2 

2 

2 

n 

a* 

2 

2 

2 

2 

11 

3 

3 

3 

3 

3 

3 

2 

3 

3 

3 

12 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

13 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

14 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

15 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 
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SEGMENTATION  ON  ITERATION  3: 


ROV: 

COLUMN: 

CONFUSION  MATRIX: 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

True 

Class 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 

3 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

- 

— 

---- 

3 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

50 

0 

0 

50 

4 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

Label  2 

0 

44 

0 

44 

5 

1 

1 

•  1 

1 

1 

1 

1 

1 

1 

1 

3 

0 

6 

50 

56 

6 

2 

3 

2 

3 

2 

3 

2 

3 

2 

3 

— 

— 

---- 

7 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

50 

50 

50 

150 

8 

3 

2 

2 

2 

2 

2 

2 

2 

2 

2 

9 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

10 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

11 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

12 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

13 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

14 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

15 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

All  computations  reported  here  were  carried  out  using  FORTRAN  computer 
programs  written  by  the  author.  These  programs  have  been  sent  to  the 
Statistics  Program  at  ONR  for  deposit  in  NRL. 
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TABLE  1.  GNP.  Units:  billions  of  current  (non-constant)  dollars 
(from  Nelson  (1973),  pp.  100-101) 


1 

2 

Quarter 

3  4  1  2  3  4 

1947-48  GNP 
change 

224  228 

4.0  4.2 

232  242  248  256  263  264 

10.3  5.9  7.6  6.9  1.4  -5.4 

1949-50  GNP 
change 

259  255 

-3.3  1. 

257  255  266  27  293  305 

9  -2.1  11.0  9.4  17.7  11.4  13.: 

1951-52  GNP 
change 

318 

7.8 

326 

7.0 

333  337  340  339  346  358 

4.1  2.6  -0.4  6.5  12.1  6.5 

1953-54  GNP  364  368  366  361  361  360  365  373 

change  3.3  -1.7  -5.0  -0.1  -0.3  4.3  8.7  12.8 


1955-56  GNP  386  394  403  409  411  416  421  430 

change  8.2  8.1  6.3  1.8  5.6  4.4  8.9  7.4 


1957-58  GNP 
change 

437 

3.0 

440 

6.4 

446 

-4.8 

442 

-6.8 

435 

3.6 

438 

13.1 

451 

13.0 

464 

9.6 

1959-60  GNP 
change 

474 

12.9 

487 

-2.9 

484 

6.5 

491 

12.5 

503 

1.7 

505 

-0.5 

504 
-0.  .9 

503 

0.3 

1961-62  GNP 
change 

504 

11.3 

515 

9.3 

524 

13.5 

538 

10.1 

548 

9.4 

557 

7.2 

564 

7.6 

572 

5.4 

1963-64  GNP 
change 

577 

6.8 

584 

10.5 

595 

11.1 

606 

11.9 

618 

10.3 

628 

10.9 

639 

6.2 

645 

17.7 

1965-66  GNP 
change 

663 

12.9 

676 

15.4 

691 

18.9 

710 

19.5 

730 

13.8 

743 

12.6 

756 

14.8 

771 

13.5 

1965-66  GNP  663 

change  12 . 9 


Sclove:  On  Segmentation  of  Time  Series  and  Images 


17 


TABLE  2.  Fitting  models 

Model  AIC 

a 

Segmentation,  2  classes  481.4 

Segmentation,  3  classes,  full  trans.  prob.  matrix  483.6 

b 

Segmentation,  3  classes,  sparse  trans.  prob.  matrix  488.5 
Segmentation,  4  classes,  full  trans.  prob.  matrix  507.1 

b 

Segmentation,  4  classes,  sparse  trans.  prob.  matrix  486.8 

+ 

Segmentation,  5  classes,  full  trans.  prob.  matrix  506.5 

c 

Segmentation,  5  classes,  sparse  trans.  prob.  matrix  stopped 

c 

Segmentation,  6  classes,  full  trans.  prob.  matrix  stopped 


a.  Optimum,  among  segmentation  models  considered. 

b.  Allows  transitions  only  to  adjacent  states. 

c.  Stopped,  i.e.,  the  algorithm  reached  an  iteration  where  it  allocated 
no  observations  to  one  of  the  classes. 
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(Abstract,  continued) 

with  each  class  of  segment.  Parametric  families  of  distributions  are 
considered,  a  set  of  parameter  values  being  associated  with  each  class. 
With  each  observation  is  associated  an  unobservable  label,  indicating  from 
which  class  the  observation  arose.  The  label  process  is  modeled  as  a 
Markov  chain.  Segmentation  algorithms* are  obtained  by  applying  a  method 
of  iterated  maximum  likelihood  to  the  resulting  likelihood  function. 

In  this  paper  special  attention  is  given  to  the  situation  in  which  the 
observations  are  conditionally  independent,  given  the  labels.  A  numerical 
example  is  given.  Choice  of  the  number  of  classes,  using  Akalke's 
information  criterion  (AIC)  for  model  identification,  is  illustrated. 

Similar  ideas  are  applied  to  the  problem  of  segmenting  digital  images 
where  possible  applications  include  SEASAT  (and  LANDSAT)  multi-spectral 
Images.  A  numerical  example  is  given. 
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